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i2 ' Abstract 

Si 

\^ , Knowledge bases provide applications with the benefit of easily accessible, sys- 

tematic relational knowledge but often suffer in practice from their incompleteness 
and lack of knowledge of new entities and relations. Much work has focused on 
building or extending them by finding patterns in large unannotated text corpora. 
In contrast, here we mainly aim to complete a knowledge base by predicting addi- 
tional true relationships between entities, based on generalizations that can be dis- 
tyj i cerned in the given knowledgebase. We introduce a neural tensor network (NTN) 

O ' model which predicts new relationship entries that can be added to the database. 

This model can be improved by initializing entity representations with word vec- 
tors learned in an unsupervised fashion from text, and when doing this, existing 
I relations can even be queried for entities that were not present in the database. 

QQ . Our model generalizes and outperforms existing models for this problem, and can 

classify unseen relationships in WordNet with an accuracy of 75.8%. 
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1 Introduction 



Ontologies and knowledge bases such as WordNet [1] or Yago [21 are extremely useful resources 
for query expansion [3 1, coreference resolution [4 1, question answering (Siri), information retrieval 
(Google Knowledge Graph), or generally providing inference over structured knowledge to users. 
I Much work has focused on extending existing knowledge bases ||5]|6]|2l using patterns or classifiers 

■ applied to large corpora. 



We introduce a model that can accurately learn to add additional facts to a database using only that 
database. This is achieved by representing each entity (i.e., each object or individual) in the database 
by a vector that can capture facts and their certainty about that entity. Each relation is defined by 
the parameters of a novel neural tensor network which can explicitly relate two entity vectors and is 
more powerful than a standard neural network layer 

Furthermore, our model allows us to ask whether even entities that were not in the database are 
in certain relationships by simply using distributional word vectors. These vectors are learned by a 
neural network model |7 1 using unsupervised text corpora such as Wikipedia. They capture syntactic 
and semantic information and allow us to extend the database without any manually designed rules 
or additional parsing of other textual resources. 

The model outperforms previously introduced related models such as that of Bordes et al. |[8l. We 
evaluate on a heldout set of relationships in WordNet. The accuracy for predicting unseen relations 
is 75.8%. We also evaluate in terms of ranking. For WordNet, there are 38,696 different entities 
and we use 1 1 relationship types. On average for each left entity there are 100 correct entities in a 
specific relationship. For instance, dog has many hundreds of hyponyms such as puppy, barker or 
dachshund. In 20.9% of the relationship triplets, the model ranks the correct test entity in the top 
100 out of 38,696 possible entities. 
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2 Related Work 



There is a vast amount of work extending knowledge bases using external corpora 1^ "S) "Zl, among 
many others. In contrast, little work has been done in extensions based purely on the knowledge 
base itself. The work closest to ours is that by Bordes et al. flO). We implement their approach and 
compare to it directly. Our model outperforms it by a significant margin in terms of both accuracy 
and ranking. Both models can benefit from initialization with unsupervised word vectors. 

Another related approach is that by Sutskever et al. llTOl who use tensor factorization and Bayesian 
clustering for learning relational structures. Instead of clustering the entities in a nonparametric 
Bayesian framework we rely purely on learned entity vectors . Their computation of the truth of a 
relation can be seen as a special case of our proposed model. Instead of using MCMC for inference, 
we use standard backpropagation which is modified for the Neural Tensor Network. Lastly, we do 
not require multiple embeddings for each entity. Instead, we consider the subunits (space separated 
words) of entity names. This allows more statistical strength to be shared among entities. 

Many methods that use knowledge bases as features such as [J', '4] could benefit from a method 
that maps the provided information into vector representations. We learn to modify unsupervised 
word representations via grounding in world knowledge. This essentially allows us to analyze word 
embeddings and query them for specific relations. Furthermore, the resulting vectors could be used 
in other tasks such as NER |7 | or relation classification in natural language ifTTl . 

Lastly, Ranzato et al. 1 12 1 introduced a factored 3-way Restricted Boltzmann Machine which is also 
parameterized by a tensor 

3 Neural Tensor Networks 

In this section we describe the full neural tensor network. We begin by describing the representation 
of entities and continue with the model that learns entity relationships. 

We compare using both randomly initialized word vectors and pre-trained 100-dimensional word 
vectors from the unsupervised model of CoUobert and Weston [13 7 1. Using free Wikipedia text, 
this model learns word vectors by predicting how likely it is for each word to occur in its context. 
The model uses both local context in the window around each word and global document context. 
Similar to other local co-occurrence based vector space models, the resulting word vectors cap- 
ture distributional syntactic and semantic information. For further details and evaluations of these 
embeddings, see lll?l [T3] |T5l . 

For cases where the entity name has multiple words, we simply average the word vectors. 

The Neural Tensor Network (NTN) replaces the standard linear layer with a bilinear layer that di- 
rectly relates the two entity vectors. Let ei, 62 G M'' be the vector representations of the two entities. 
We can compute a score of how plausible they are in a certain relationship R by the following NTN- 
based function: 



where / = tanh is a standard nonlinearity. We define M/I^-'^l e g^dxdxk ^ tensor and the bilinear 
tensor product results in a vector h g M.^, where each entry is computed by one slice of the tensor: 



The remaining parameters for relation R are the standard form of a neural network: Vr € '^kx2d 
and U e M'=,&fl G M.''. 

The main advantage of this model is that it can directly relate the two inputs instead of only implicitly 
through the nonlinearity. The bilinear model for truth values in IfTOl becomes a special case of this 
model with Vr = 0,bii = 0, k ~ 1, f — identity. 

In order to train the parameters W, U, V, E, b, we minimize the following contrastive max-margin 
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where N is the number of training triplets and we score the correct relation triplets higher than a 
corrupted one in which one of the entities was replaced with a random entity. For each correct triplet 
we sample C random corrupted entities. 

The model is trained by taking gradients with respect to the five sets of parameters and using mini- 
batched L-BFGS. 

4 Experiments 

In our experiments, we follow the data settings of WordNet in |[9l|. There are a total of 38,696 
different entities and 11 relations. We use 112,581 triplets for training, 2,609 for the development 
set and 10,544 for final testing. 

The WordNet relationships we consider are has instance, type of, member meronym, member 
holonym, part of, has part, subordinate instance of, domain region, synset domain region, similar 
to, domain topic. 

We compare our model with two models in Bordes et al. ||9] |8], which have the same goal as ours. 
The model of ||9l has the following scoring function: 

g(ei,i?, 62) = \\WR,ieftei -WR,„ghte2\\i, (4) 

where Wa^ieft, Wji,right G M'*'*''. The model of HI also maps each relation type to an embedding 
efl G M'* and scores the relationships by: 

g(ei,i?,e2) = -{WiCi (g) Wrei,ieB. + h) ■ (^^262 ® W^re;,2efl + 62), (5) 

where 14^1, Wre;,i, W2, Wre;, 2 S R'*^'^,6i,62 S R'^'^^. In the comparisons below, we call these 
two models the similarity model and the Hadamard model respectively. While our function scores 
correct triplets highly, these two models score correct triplets lower All models are trained in a 
contrastive max-margin objective functions. 

Our goal is to predict "correct" relations (ei , i?, 62) in the testing data. We can compute a score for 
each triplet (ei, i?, 62). We can consider either just a classification accuracy result as to whether the 
relation holds, or look at a ranking of 62, for considering relative confidence in particular relations 
holding. We use a different evaluation set from Bordes et al. [9J because it has became apparent to 
us and them that there were issues of overlap between their training and testing sets which impacted 
the quality and interpretability of their evaluation. 

Ranking 

For each triplet (ei , i?, 62), we compute the score g{ei , R, e) for all other entities in the knowledge 
base e G E. We then sort values by decreasing order and report the rank of the correct entity 62. 

For WordNet the total number of entities is \E\ = 38, 696. Some of the questions relating to triplets 
are of the form "A is a type of ?" or "A has instance ?" Since these have multiple correct answers, 
we report the percentage of times that 62 is ranked in the top 100 of the list (recall @ 100). The 
higher this number, the more often the specific correct test entity has likely been correctly estimated. 

After cross-validation of the hyperparameters of both models on the development fold, our neural 
tensor net obtains a ranking recall score of 20.9% while the similarity model achieves 10.6%, and the 
Hadamard model achieves only 7.4%. The best performance of the NTN with random initialization 
instead of the semantic vectors drops to 16.9% and the similarity model and the Hadamard model 
only achieve 5.7% and 7.1%. 

Classification 

In this experiment, we ask the model whether any arbitrary triplet of entities and relations is true or 
not. With the help of the large vocabulary of semantic word vectors, we can query whether certain 
WordNet relationships hold or not even for entities that were not originally in WordNet. 

We use the development fold to find a threshold Tn for each relation such that if /(ei , i?, 62) > Tj^, 
the relation {ei,R, 62) holds, otherwise it is considered false. In order to create negative examples. 
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we randomly switch entities and relations from correct testing triplets, resulting in a total of 2 x 
10, 544 triplets. The final accuracy is based on how many of of triplets are classified correctly. 

The Neural Tensor Network achieves an accuracy of 75.8% with semantically initialized entity vec- 
tors and 70.0% with randomly initiahzed ones. In comparison, the similarity based model only 
achieve 66.7% and 51.6%, the Hadamard model achieve 71.9% and 68.2% with the same setup. AH 
models improve in performance if entities are represented as an average of their word vectors but 
we will leave experimentation with this setup to future work. 

5 Conclusion 

We introduced a new model based on Neural Tensor Networks. Unlike previous models for predict- 
ing relationships purely using entity representations in knowledge bases, our model allows direct 
interaction of entity vectors via a tensor. This architecture allows for much better performance in 
terms of both ranking correct answers out of tens of thousands of possible ones and predicting unseen 
relationships between entities. It enables the extension of databases even without external textual 
resources but can also benefit from unsupervised large corpora even without manually designed 
extraction rules. 
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